focal agent
- North America > United States > Michigan (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- (4 more...)
Evaluating Generalization Capabilities of LLM-Based Agents in Mixed-Motive Scenarios Using Concordia
Smith, Chandler, Abdulhai, Marwa, Diaz, Manfred, Tesic, Marko, Trivedi, Rakshit S., Vezhnevets, Alexander Sasha, Hammond, Lewis, Clifton, Jesse, Chang, Minsuk, Duéñez-Guzmán, Edgar A., Agapiou, John P., Matyas, Jayd, Karmon, Danny, Kundu, Akash, Korshuk, Aliaksei, Ananya, Ananya, Rahman, Arrasy, Kulandaivel, Avinaash Anand, McHale, Bain, Zhang, Beining, Alexander, Buyantuev, Rojas, Carlos Saith Rodriguez, Wang, Caroline, Talele, Chetan, Liu, Chenao, Lin, Chichen, Riazi, Diana, Shi, Di Yang, Tewolde, Emanuel, Tennant, Elizaveta, Zhong, Fangwei, Cui, Fuyang, Zhao, Gang, Piqueras, Gema Parreño, Yun, Hyeonggeun, Makarov, Ilya, Cui, Jiaxun, Purbey, Jebish, Dilkes, Jim, Nguyen, Jord, Xiao, Lingyun, Giraldo, Luis Felipe, Chacon-Chamorro, Manuela, Beltran, Manuel Sebastian Rios, Segura, Marta Emili García, Wang, Mengmeng, Alim, Mogtaba, Quijano, Nicanor, Schiavone, Nico, Macmillan-Scott, Olivia, Peña, Oswaldo, Stone, Peter, Kadiyala, Ram Mohan Rao, Fernandez, Rolando, Manrique, Ruben, Lu, Sunjia, McIlraith, Sheila A., Dhuri, Shamika, Shi, Shuqing, Gupta, Siddhant, Sarangi, Sneheel, Subramanian, Sriram Ganapathi, Cha, Taehun, Klassen, Toryn Q., Tu, Wenming, Fan, Weijian, Ruiyang, Wu, Feng, Xue, Du, Yali, Liu, Yang, Wang, Yiding, Kang, Yipeng, Sung, Yoonchang, Chen, Yuxuan, Zhang, Zhaowei, Wang, Zhihan, Wu, Zhiqiang, Chen, Ziang, Zheng, Zilong, Jia, Zixia, Wang, Ziyan, Hadfield-Menell, Dylan, Jaques, Natasha, Baarslag, Tim, Hernandez-Orallo, Jose, Leibo, Joel Z.
Large Language Model (LLM) agents have demonstrated impressive capabilities for social interaction and are increasingly being deployed in situations where they might engage with both human and artificial agents. These interactions represent a critical frontier for LLM-based agents, yet existing evaluation methods fail to measure how well these capabilities generalize to novel social situations. In this paper, we introduce a method for evaluating the ability of LLM-based agents to cooperate in zero-shot, mixed-motive environments using Concordia, a natural language multi-agent simulation environment. Our method measures general cooperative intelligence by testing an agent's ability to identify and exploit opportunities for mutual gain across diverse partners and contexts. We present empirical results from the NeurIPS 2024 Concordia Contest, where agents were evaluated on their ability to achieve mutual gains across a suite of diverse scenarios ranging from negotiation to collective action problems. Our findings reveal significant gaps between current agent capabilities and the robust generalization required for reliable cooperation, particularly in scenarios demanding persuasion and norm enforcement.
- North America > Canada > Ontario > Toronto (0.14)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
- Asia > China > Beijing > Beijing (0.04)
- (7 more...)
- Research Report > Experimental Study (0.93)
- Research Report > New Finding (0.87)
- North America > United States (0.14)
- Europe > United Kingdom > England (0.14)
Theory of Mind Using Active Inference: A Framework for Multi-Agent Cooperation
Pitliya, Riddhi J., Çatal, Ozan, Van de Maele, Toon, Pezzato, Corrado, Verbelen, Tim
Theory of Mind (ToM) -- the ability to understand that others can have differing knowledge and goals -- enables agents to reason about others' beliefs while planning their own actions. We present a novel approach to multi-agent cooperation by implementing ToM within active inference. Unlike previous active inference approaches to multi-agent cooperation, our method neither relies on task-specific shared generative models nor requires explicit communication. In our framework, ToM-equipped agents maintain distinct representations of their own and others' beliefs and goals. ToM agents then use an extended and adapted version of the sophisticated inference tree-based planning algorithm to systematically explore joint policy spaces through recursive reasoning. We evaluate our approach through collision avoidance and foraging simulations. Results suggest that ToM agents cooperate better compared to non-ToM counterparts by being able to avoid collisions and reduce redundant efforts. Crucially, ToM agents accomplish this by inferring others' beliefs solely from observable behaviour and considering them when planning their own actions. Our approach shows potential for generalisable and scalable multi-agent systems while providing computational insights into ToM mechanisms.
Beyond In-Distribution Performance: A Cross-Dataset Study of Trajectory Prediction Robustness
Yao, Yue, Goehring, Daniel, Reichardt, Joerg
The robustness of trajectory prediction is essential for practical applications in autonomous driving. The advancement of trajectory prediction models is catalyzed through public motion datasets and associated competitions, such as Argoverse 2 (A2) [1], and Waymo Open Motion (WO) [2]. These competitions establish standardized metrics and test protocols and score predictions on test data that is withheld from all competitors and hosted on protected evaluation servers only. This is intended to objectively compare the generalization ability of models to unseen data. However, these withheld test examples still share similarities with the training samples, such as sensor setup, map representation, post-processing, geographic, and scenario selection biases employed during dataset creation. Consequently, the test scores reported in each competition are examples of In-Distribution (ID) testing. To effectively evaluate model generalization, it is essential to test models on truly Out-of-Distribution (OoD) test samples, such as those from different motion datasets. We investigate model generalization across two large-scale motion datasets [3]: Argoverse 2 (A2) and Waymo Open Motion (WO). The WO dataset, with 576k scenarios, is more than twice the size of A2, which contains 250k scenarios.
Improving Out-of-Distribution Generalization of Trajectory Prediction for Autonomous Driving via Polynomial Representations
Yao, Yue, Yan, Shengchao, Goehring, Daniel, Burgard, Wolfram, Reichardt, Joerg
Robustness against Out-of-Distribution (OoD) samples is a key performance indicator of a trajectory prediction model. However, the development and ranking of state-of-the-art (SotA) models are driven by their In-Distribution (ID) performance on individual competition datasets. We present an OoD testing protocol that homogenizes datasets and prediction tasks across two large-scale motion datasets. We introduce a novel prediction algorithm based on polynomial representations for agent trajectory and road geometry on both the input and output sides of the model. With a much smaller model size, training effort, and inference time, we reach near SotA performance for ID testing and significantly improve robustness in OoD testing. Within our OoD testing protocol, we further study two augmentation strategies of SotA models and their effects on model generalization. Highlighting the contrast between ID and OoD performance, we suggest adding OoD testing to the evaluation criteria of trajectory prediction models.
- Europe > Germany > Bavaria > Middle Franconia > Nuremberg (0.04)
- Europe > Germany > Baden-Württemberg > Freiburg (0.04)
- Automobiles & Trucks (0.50)
- Transportation > Ground > Road (0.41)
- Information Technology > Robotics & Automation (0.41)
Designing Skill-Compatible AI: Methodologies and Frameworks in Chess
Hamade, Karim, McIlroy-Young, Reid, Sen, Siddhartha, Kleinberg, Jon, Anderson, Ashton
Powerful artificial intelligence systems are often used in settings where they must interact with agents that are computationally much weaker, for example when they work alongside humans or operate in complex environments where some tasks are handled by algorithms, heuristics, or other entities of varying computational power. For AI agents to successfully interact in these settings, however, achieving superhuman performance alone is not sufficient; they also need to account for suboptimal actions or idiosyncratic style from their less-skilled counterparts. We propose a formal evaluation framework for assessing the compatibility of near-optimal AI with interaction partners who may have much lower levels of skill; we use popular collaborative chess variants as model systems to study and develop AI agents that can successfully interact with lower-skill entities. Traditional chess engines designed to output near-optimal moves prove to be inadequate partners when paired with engines of various lower skill levels in this domain, as they are not designed to consider the presence of other agents. We contribute three methodologies to explicitly create skill-compatible AI agents in complex decision-making settings, and two chess game frameworks designed to foster collaboration between powerful AI agents and less-skilled partners. On these frameworks, our agents outperform state-of-the-art chess AI (based on AlphaZero) despite being weaker in conventional chess, demonstrating that skill-compatibility is a tangible trait that is qualitatively and measurably distinct from raw performance. Our evaluations further explore and clarify the mechanisms by which our agents achieve skill-compatibility.
ProIn: Learning to Predict Trajectory Based on Progressive Interactions for Autonomous Driving
Dong, Yinke, Yuan, Haifeng, Liu, Hongkun, Jing, Wei, Li, Fangzhen, Liu, Hongmin, Fan, Bin
Accurate motion prediction of pedestrians, cyclists, and other surrounding vehicles (all called agents) is very important for autonomous driving. Most existing works capture map information through an one-stage interaction with map by vector-based attention, to provide map constraints for social interaction and multi-modal differentiation. However, these methods have to encode all required map rules into the focal agent's feature, so as to retain all possible intentions' paths while at the meantime to adapt to potential social interaction. In this work, a progressive interaction network is proposed to enable the agent's feature to progressively focus on relevant maps, in order to better learn agents' feature representation capturing the relevant map constraints. The network progressively encode the complex influence of map constraints into the agent's feature through graph convolutions at the following three stages: after historical trajectory encoder, after social interaction, and after multi-modal differentiation. In addition, a weight allocation mechanism is proposed for multi-modal training, so that each mode can obtain learning opportunities from a single-mode ground truth. Experiments have validated the superiority of progressive interactions to the existing one-stage interaction, and demonstrate the effectiveness of each component. Encouraging results were obtained in the challenging benchmarks.
- Asia > Middle East > Republic of Türkiye > Karaman Province > Karaman (0.04)
- Asia > China > Beijing > Beijing (0.04)
- Transportation > Ground > Road (0.61)
- Information Technology > Robotics & Automation (0.61)
- Automobiles & Trucks (0.61)
- Education (0.48)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
- Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (0.71)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)